Tree Canopy and it’s Impact on the Urban Heat Effect

Authors
Affiliation

Eleanor Lindsey

Colorado State University

Clara Jordan

Colorado State University

Sierra Champion

Colorado State University

The Relationship of Tree Canopy to Urban Heat Effect and Inequality: A Case Study

ABSTRACT

This study investigates the relationship between tree canopy cover, urban temperature, and income levels in Los Angeles, California, with a focus on the Urban Heat Island (UHI) effect. As cities like Los Angeles continue to expand and global temperatures rise, urban areas are experiencing higher heat exposure, particularly in neighborhoods with limited green space. We aimed to test two main hypotheses: (1) that lower tree canopy coverage would be associated with higher temperatures, and (2) that lower canopy coverage would be correlated with lower income, based on the “luxury effect” described in previous research. Using data from 2003 to 2023, including average annual temperatures, tree canopy loss, and 2023 census tract median incomes, we conducted visual and statistical analyses to explore these relationships. While visualizations suggested a strong association between canopy loss and temperature, the Pearson correlation test did not yield statistically significant results. Additionally, our analysis found no measurable correlation between tree canopy and income, although this may be due to limitations in the data structure. Despite these challenges, our findings reinforce the importance of urban tree canopies in regulating temperature and highlight the need for more targeted, high-resolution data to understand environmental inequalities in urban heat exposure fully.

INTRODUCTION & HYPOTHESIS

Due to the effects of climate change, large cities like Los Angeles, California, are expected to see regularly increasing temperatures. When looking at a heat map, an odd phenomenon shows higher temperatures within city limits than in rural areas on the border. This unequal heating is called the Urban Heat Island (UHI) effect (HeatIslandHealthEffects?). This is due to reduced vegetation cover, increased concrete, disruptive buildings, and human activities. This effect is projected to worsen as more of the world’s population migrates to urban environments, cities continue to grow, and global temperatures rise (HeatIslandTrends?) Areas already impacted by the UHI effect will “bear the brunt” of intensified heat waves brought on by climate change (HeatIslandTrends?).

From the 1980s onward, environmental movements began to advocate for increased green space and sustainability in urban planning. Initiatives promoting tree canopy restoration, urban greening, and park creation have been prioritized to mitigate the impacts of climate change and improve public health. Notably, the city has recognized the need to address the disproportionate effects of urban heat on low-income communities. However, despite these efforts, many neighborhoods still struggle with heat exposure and lack adequate vegetation, emphasizing the need for continued research and policy reform to ensure equitable access to urban green spaces.

During this period, a growing awareness of environmental issues began to emerge. Land-use planning and urban development often neglect the integration of green spaces, leading to stark inequalities within the city. Areas predominantly inhabited by low-income communities and people of color were typically under served regarding infrastructure and green spaces, resulting in disparities in health outcomes and environmental benefits compared to wealthier neighborhoods (UrbanGreenSpaceInequality?).

Even before considering climate change, the urban heat island effect can have severe impacts, including changing urban ecosystems and biodiversity and exacerbating human health issues. Increasing urban temperatures can lead to health impacts, including increased exposure to extreme heat, air pollution, heat stroke, cardiorespiratory mortality, kidney disease, and mental illness, among others (HeatIslandHealthEffects?).

One of the primary benefits of urban green spaces is their ability to cool the surrounding environment. Trees and vegetation play a vital role in reducing temperatures through evapotranspiration, where water is absorbed by the roots and released as water vapor from the leaves. This natural cooling effect can significantly lower urban temperatures, helping to combat the extreme heat conditions associated with the Urban Heat Island (UHI) effect. Research shows that areas with denser tree canopies can experience 5-10 degrees Fahrenheit temperature reductions compared to surrounding asphalt or concrete regions (CoolingEffectsOfTrees?).

Additionally, urban green spaces contribute to improved air quality. Trees act as natural air filters by absorbing pollutants and providing oxygen. Green spaces have been linked to reductions in air pollutants like carbon dioxide, nitrogen dioxide, and particulate matter, thereby promoting healthier urban environments (UrbanGreenSpacesAirQuality?). Studies indicate that residents living near green spaces report better physical and mental health outcomes, likely due to increased opportunities for physical activity, social interaction, and psychological restoration provided by nature (HealthBenefitsOfGreenSpaces?).

Los Angeles is a prime example of the UHI effect as it is a large city housing millions of people with a tall, dense skyline. Asphalt and blacktop trap a significant amount of heat, making the temperatures within the five counties that make up the metropolis hotter than those in surrounding rural areas (LAHeatIsland?). In this study, we aim to investigate how the urban heat island effect in Los Angeles is related to tree cover and income. We intend to explore if tree cover and temperature are related, and how this may be spatially associated with income. We were inspired by Schell et al. 2020 b’s thoughts about the luxury effect, the idea that urban biodiversity is positively correlated with neighborhood wealth (RacismInUrbanEnvironments?). This curiosity raised questions like: How will climate change exacerbate urban heat? Can we identify neighborhoods in Los Angeles at higher risk of urban heat health impacts? To investigate this question, the objective of our study was to answer the following question: Does decreased tree canopy contribute to increased urban temperatures and correlate with median income in Los Angeles? Our hypotheses include:

  1. As tree canopy decreases, the mean annual temperature will increase because of the urban heat island effect.

  2. As tree canopy decreases, the mean annual income will decrease due to the historical marginalization of low-income home communities in neighborhoods with little green space, per the luxury effect (RacismInUrbanEnvironments?).

To explore these hypotheses, we obtained Los Angeles-centric data on yearly average temperatures in Los Angeles from 2013 - 2023 (TemperatureData?), tree canopy cover loss from 2001 - 2023 (LATreeLoss?), and median income for 2023 (MedianIncomeCensusTract?). With R as our analysis platform, we combined these datasets and visually analyzed the relationships between our three variables using scatter plots.

METHODS

We have compiled data sets on air temperature records, median income, and land cover. Our question pertains to the heat island effect, which is correlated to the change in air temperature within city limits. We are also looking into median income to see how the heat island effect impacts different people. The land cover data is added for further questions about how landforms impact temperature and urban sprawl. To analyze this data, we plan to use Dplyr to clean it, such as selecting, filtering, and omitting NA values. Because we are using census data, we expect different periods for the datasets. The median income data only has a five-year range, while the temperature and land cover data span from 2003 to 2023. 

Once our data was cleaned, we joined our data sets (temperature, land cover, and income) using a left join on the “year” column. This was done to create a unified data frame to analyze. One challenge we anticipate is joining these data sets because our data sets focus on a variety of data types, and we may need to create new columns that are common to all data sets so we can execute a join. After analyzing the data frames, we found that each set had a year column that could be used to join. We are operating under the assumption that we have everything we need; however, once our datasets are joined, we may need supplemental data on population density to contextualize heat island increases and income distribution. 

This joined data frame was then analyzed to answer our question: Does decreased tree canopy contribute to increased urban temperatures and correlate with median income? To answer this question, wel: 

  1. Tested for correlation between tree canopy and mean annual temperature in Los Angeles using the correlation function. This corresponded to Hypothesis A (see Introduction). 

  2. Tested for correlation between tree canopy and mean annual income in Los Angeles using the correlation function. This corresponded to Hypothesis B (see Introduction). 

These methods helped us answer our question by revealing potential relationships between tree canopy, mean annual temperature, and income. This allowed us to draw conclusions about which communities in Los Angeles should implement adaptation measures as climate change increases temperatures and exacerbates the urban heat island effect, along with its negative health impacts.

RESULTS

This study found that as the tree canopy decreases, mean annual temperature will increase, and discerned no relationship between tree canopy and median income. To test Hypothesis A, as tree canopy decreases, the mean annual temperature will increase because of the urban heat island effect, we first explored a comparison between urban and suburban temperature data from downtown Los Angeles and Malibu Hills, respectively. This comparison revealed a stark contrast between suburban and urban temperatures across the twenty-year time series from 2003 to 2023. Consistently throughout this time series, the Malibu Hills temperature records were lower than those for downtown Los Angeles, as summarized in Figure 1 below.

Figure 1:

library(here)
library(dplyr)
library(readr)
library(ggplot2)
library(lubridate)
library(tidyverse)
library(plotly)
temp_data1<-read_csv(here("data/2003-2005 Temperature Data(in).csv"))
temp_data2<-read_csv(here("data/2006-2012 Temperature Data(in).csv"))
temp_data3<-read_csv(here("data/2013-2019 Temperature Data(in).csv"))
temp_data4<-read_csv(here("data/2020-2023 Temperature Data(in).csv"))
temperature_data<-bind_rows(temp_data1, temp_data2, temp_data3, temp_data4)%>%
  rename('Year'='DATE')%>%
  select(-STATION)%>%
  na.omit()

temperature_data$Year <- as.numeric(temperature_data$Year)

str(temperature_data$Year)
 num [1:745] 2003 2004 2005 2003 2004 ...
downtown_stations <- c("LOS ANGELES DOWNTOWN USC, CA US", "MALIBU HILLS CALIFORNIA, CA US")
temperature_data_filtered <- temperature_data %>%
  filter(NAME %in% downtown_stations)

station_labels<- c("LOS ANGELES DOWNTOWN USC, CA US"="Downtown LA","MALIBU HILLS CALIFORNIA, CA US"='Malibu Hills' )

Temperature_plot <- ggplot(temperature_data_filtered, aes(x = Year, y = TAVG, color = NAME)) +
  geom_line() +
  scale_x_continuous(
    breaks = seq(2003, 2023, by = 4), 
    labels = seq(2003, 2023, by = 4)) + 
  scale_color_manual(values = c("dodgerblue", "firebrick"), labels = station_labels) +
  labs(
    title = "Average Temperature in Los Angeles Downtown Stations",
    x = "Year",
    y = "Temperature (°F)"
  ) +
  theme_minimal(base_size = 14)

ggplotly(Temperature_plot)
Source: Article Notebook

In Figure 1 the temperature data is categorized by a yearly average temperature in LA in degrees Fahrenheit. These results confirm the existence of the UHE, characterized by higher and less variable temperatures in the urban environment than the suburban environment. This study then incorporated tree canopy loss as a variable influencing temperature. Figure 2 visualizes trends in tree canopy loss in Los Angeles over the same time series considered for temperature data. We can see continual tree canopy loss over the twenty years considered, characteristic of urban environments which continue to develop as population increases over time. Spikes indicate periods of elevated tree canopy loss, such as in 2009 and 2020.

Figure 2:

tree_canopy_data<-read_csv(here("data/Tree_canopy_loss/treecover_loss__ha.csv"))%>%
  select(-adm1, -adm2, -iso)%>%
  rename('Year'='umd_tree_cover_loss__year','Tree_loss_ha'='umd_tree_cover_loss__ha' )
Tree_loss_plot<-ggplot(tree_canopy_data, aes(x=Year, y=Tree_loss_ha))+
  geom_area(fill="forestgreen",color="forestgreen")+
  labs(x="Year", y="Tree Canopy Loss (ha)", title="Tree Canopy Loss in LA per Year")

Tree_loss_plot

Given the hypothesized relationship between tree canopy loss and temperature due to the UHE, it is no surprise that this study also found a very strong visual correlation between these two variables. As seen below in Figure 3, temperature trends follow those of tree canopy loss in Los Angeles. While the spikes and dips are not to the same order of magnitude between these two variables, it is evident that increases in tree canopy cover loss impact temperature. For example, in 2009, there was a large decrease in tree canopy cover of over 20,000 hectares. After this spike, temperature readings in both downtown Los Angeles and Malibu Hills increase steadily for approximately the next five years. It is important to note that throughout these changes in tree canopy cover that the variation in Malibu Hills temperature records is less than that of downtown Los Angeles, further demonstrating the regulatory and cooling impact of tree canopy in developed environments.

Figure 3:

median_income_data <- read_csv(here("data/Median_Income_and_AMI_(census_tract).csv"))%>%
  select(-tract, -sup_dist, -ESRI_OID, -Shape__Area, -Shape__Length )
combined_Temp_tree_data<-left_join(tree_canopy_data,temperature_data, by='Year')%>%
  rename('Temp_average'='TAVG', 'Station_name'='NAME')
combined_income_tree_temp <- cross_join(combined_Temp_tree_data, median_income_data)
combined_income_tree_temp <- combined_income_tree_temp[-c(1:4990), ]

station_labels <- c(
  "LOS ANGELES DOWNTOWN USC, CA US" = "Downtown LA",
  "MALIBU HILLS CALIFORNIA, CA US" = "Malibu Hills")

filtered_data_combined <- combined_income_tree_temp %>%
  filter(Station_name %in% c("LOS ANGELES DOWNTOWN USC, CA US", "MALIBU HILLS CALIFORNIA, CA US"))


scale_factor <- max(combined_income_tree_temp$Tree_loss_ha, na.rm = TRUE) /
                max(combined_income_tree_temp$Temp_average, na.rm = TRUE)

# Plot
combined_plot <- ggplot(filtered_data_combined, aes(x = Year)) +
  # Tree loss line
  geom_line(aes(y = Tree_loss_ha), color = "forestgreen", size = 1) +
  
  # Temperature lines by station
  geom_line(aes(y = Temp_average * scale_factor, color = Station_name), size = 1) +

  scale_x_continuous(
    breaks = seq(2003, 2023, by = 2),
    labels = seq(2003, 2023, by = 2)
  ) +
  scale_y_continuous(
    name = "Tree Canopy Loss (ha)",
    sec.axis = sec_axis(~./scale_factor, name = "Average Temperature (°F)")
  ) +
  scale_color_manual(
    values = c("LOS ANGELES DOWNTOWN USC, CA US" = "dodgerblue",
               "MALIBU HILLS CALIFORNIA, CA US" = "firebrick"),
    labels = station_labels,
    name = "Weather Station"
  ) +
  theme_minimal() +
  theme(
    axis.title.y.left = element_text(color = "forestgreen"),
    axis.text.y.left = element_text(color = "forestgreen"),
    axis.title.y.right = element_text(color = "black"),
    axis.text.y.right = element_text(color = "black"),
    legend.position = "bottom"
  ) +
  labs(
    title = "Tree Loss and Average Temperature by Year",
    x = "Year"
  )
ggplotly(combined_plot)

This relationship between tree canopy loss and temperature, however, was surprisingly not confirmed by a correlation test performed between the two variables. A Pearson’s product-moment correlation test revealed a t-value of approximately 0.69 and a p-value of approximately 0.48. T-values represent the “test statistic,” which varies from a plausible range of 0 to 1, with 0 representing no correlation and 1 representing perfect correlation (which rarely occurs). Thus, a t-value of 0.69 between tree canopy loss and temperature indicates a correlation between the two that is not particularly strong. This relative weakness of correlation is confirmed by the p-value of 0.48, which, being greater than the critical value of 0.05, is indicative of a statistically insignificant result. The failure of this statistical test does not mean that there is no relationship between these variables. The Pearson’s product-moment correlation test involves several assumptions that would have to be further investigated to produce a solid conclusion on whether Hypothesis A is rejected or not. These assumptions include variable linearity, variance homogeneity, and that there are no outlier values. The vast difference in the order of magnitudes between tree canopy loss and temperature may have influenced the results of the correlation test.

Figure 4:

combined_Temp_tree_data<-left_join(tree_canopy_data,temperature_data, by='Year')%>%
  rename('Temp_average'='TAVG', 'Station_name'='NAME')

Combined_cor<-cor.test(combined_Temp_tree_data$Temp_average, combined_Temp_tree_data$Tree_loss_ha)

print(Combined_cor)

    Pearson's product-moment correlation

data:  combined_Temp_tree_data$Temp_average and combined_Temp_tree_data$Tree_loss_ha
t = 0.69235, df = 743, p-value = 0.4889
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.04652181  0.09704331
sample estimates:
       cor 
0.02539167 

In addition to exploring Hypothesis A, this study also investigated Hypothesis B, as tree canopy decreases, mean annual income will decrease due to the historical marginalization of low-income home communities in neighborhoods with little green space, in accordance to ideas brought forward by the luxury effect. To test this hypothesis, as outlined in the Methods section, we performed yet another correlation test. This correlation test, between tree canopy loss and median income, produced a curious result as seen below in Figure 5.

Figure 5:

median_income_data <- read_csv(here("data/Median_Income_and_AMI_(census_tract).csv"))%>%
  select(-tract, -sup_dist, -ESRI_OID, -Shape__Area, -Shape__Length )
tree_income_cor<- cor(combined_income_tree_temp[,c("Tree_loss_ha", "med_hh_income")])


print(tree_income_cor)
              Tree_loss_ha med_hh_income
Tree_loss_ha             1            NA
med_hh_income           NA             1

A result of “NA” for correlating tree canopy loss to median income has several explanations. This result can occur if there was a data cleaning error and the correlation test was performed on data with “Not Available” values, however, this study corrected the presence of NA’s during data cleaning. Other possible causes could include that our variables had very little variance, data with too few observations, or data type issues. Much like the NA value issue, this study, when carrying out its method,s confirmed that these issues were corrected. Finally, the last explanation is that R, when computing the correlation, created an output of NA to indicate that the two variables being compared were too different to compare. From these, statistically, this study found that Hypothesis B should be rejected. However, the authors of this study disagree with this statistical conclusion due to potential assumptions in the correlation test and evidence in existing literature that suggests a relationship between tree canopy and income. This will be discussed in the Discussion & Conclusions section below.

DISCUSSION & CONCLUSIONS

Our study examined the relationship between tree canopy cover, temperature, and income in Los Angeles, focusing on how the Urban Heat Island (UHI) effect may impact different communities. While our visual data showed clear trends, such as how Downtown LA is consistently hotter than suburban areas like Malibu Hills, and how tree canopy loss has increased over time, the results from our statistical tests were more mixed.


For Hypothesis A, we expected to see a strong connection between decreasing tree canopy and increasing temperature, and visually, that relationship was there. For example, years with significant spikes in canopy loss, like 2009, were followed by steady increases in average temperature. However, when we ran a Pearson correlation test, the results weren’t statistically significant. We think this might be because the correlation test assumes a linear relationship and consistent variance, which may not apply to our data. It also doesn’t capture delayed effects, which could be important here, like temperature rising in the years after trees are removed.


Hypothesis B, which suggested that lower tree canopy would correlate with lower income, ended up giving us an “NA” in our correlation test. This could be due to several factors, such as mismatched data formats or variables that don’t align well. But from everything we’ve read and seen in class, it’s clear that tree cover and green space are not evenly distributed in LA, and that environmental inequality is a real issue. So, while the data didn’t support Hypothesis B statistically, we’re hesitant to reject it completely without further investigation.


There were some limitations to our analysis. One of the biggest was the mismatch in spatial and temporal scales across datasets. For example, our temperature data was organized by year and station, but our income data was tied to census tracts. That made it difficult to compare or join the datasets in a meaningful way directly. Also, some of the environmental patterns we were looking at might take years to play out—something our current time frame might not fully capture.


Despite these challenges, the project demonstrates how useful it can be to combine different types of data to examine environmental justice issues. It also highlighted the importance of visualizations in helping us see patterns that statistics might miss on their own. From a data science perspective, it was a good reminder that correlation doesn’t always tell the whole story, especially when dealing with complex systems like climate, urban design, and social inequality.


If we continue this research, we would want to use more localized data, such as satellite images of tree cover by neighborhood, and include population density or even heat-related health data to deepen the analysis. It would also be helpful to explore other statistical models that can handle non-linear or lagged relationships.


Overall, even though our hypotheses weren’t fully supported statistically, our findings still suggest that tree canopy plays a significant role in regulating temperature, and that communities with limited access to green space may face greater climate-related risks. As the climate continues to warm, making cities cooler and more livable should be a top priority, especially for neighborhoods that have historically been underserved.